BDDT-SCC: A Task-parallel Runtime for Non Cache-Coherent Multicores
نویسندگان
چکیده
This paper presents BDDT-SCC, a task-parallel runtime system for non cache-coherent multicore processors, implemented for the Intel Single-Chip Cloud Computer. The BDDT-SCC runtime includes a dynamic dependence analysis and automatic synchronization, and executes OpenMP-Ss tasks on a non cache-coherent architecture. We design a runtime that uses fast on-chip intercore communication with small messages. At the same time, we use non coherent shared memory to avoid large core-to-core data transfers that would incur a high volume of unnecessary copying. We evaluate BDDT-SCC on a set of representative benchmarks, in terms of task granularity, locality, and communication. We find that memory locality and allocation plays a very important role in performance, as the architecture of the SCC memory controllers can create strong contention effects. We suggest patterns that improve memory locality and thus the performance of applications, and measure their impact.
منابع مشابه
BDDT: Block-Level Dynamic Dependence Analysis for Task-Based Parallelism
We present BDDT, a task-parallel runtime system that dynamically discovers and resolves dependencies among parallel tasks. BDDT allows the programmer to specify detailed task footprints on any memory address range, multidimensional array tile or dynamic region. BDDT uses a block-based dependence analysis with arbitrary granularity. The analysis is applicable to existing C programs without havin...
متن کاملA Coherent and Managed Runtime for ML on the SCC
Intel’s Single-Chip Cloud Computer (SCC) is a many-core architecture which stands out due to its complete lack of cache-coherence and the presence of fast, on-die interconnect for inter-core messaging. Cache-coherence, if required, must be implemented in software. Moreover, the amount of shared memory available on the SCC is very limited, requiring stringent management of resources even in the ...
متن کاملA Case for Fine-Grain Adaptive Cache Coherence
As transistor density continues to grow geometrically, processor manufacturers are already able to place a hundred cores on a chip (e.g., Tilera TILE-Gx 100), with massive multicore chips on the horizon. Programmers now need to invest more effort in designing software capable of exploiting multicore parallelism. The shared memory paradigm provides a convenient layer of abstraction to the progra...
متن کاملMyrmics: Scalable, Dependency-aware Task Scheduling on Heterogeneous Manycores
Task-based programming models have become very popular, as they offer an attractive solution to parallelize serial application code with task and data annotations. They usually depend on a runtime system that schedules the tasks to multiple cores in parallel while resolving any data hazards. However, existing runtime system implementations are not ready to scale well on emerging manycore proces...
متن کاملPredictable and Parallel Execution of Real-Time Applications on Cache-Coherent Multicores
We present an approach for designing and deploying real-time applications onto shared-bus multicore platforms with cache coherency. We model an application as a directed acyclic graph, and partition it into scheduling intervals. These are then mapped onto multiple cores and scheduled for parallel execution with the objective to reduce the worst-case response time of the application. To accompli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1606.04288 شماره
صفحات -
تاریخ انتشار 2016